home *** CD-ROM | disk | FTP | other *** search
- %OP%VS4.13 (28-Apr-92), Gerald L Fitton, R4000 5966 9904 9938
- %OP%DP0
- %OP%IRY
- %OP%PL0
- %OP%HM0
- %OP%FM0
- %OP%BM0
- %OP%LM4
- %OP%PT1
- %OP%PDPipeLine
- %OP%WC1026,2262,648,1748,0,0,0,0
- %CO:A,72,72%
- %C%Curve Fitting
- %C%by Gerald L Fitton
- Keywords:
- Best Fit Fitton
-
- Introduction
- For over a year now I have been asked to explain how to use the
- PipeDream functions linest(y,x) and trend({c,m},x). I kept promising
- myself that I ought to get around to it but it has taken a problem sent
- to me by John Nottage to finally make my mind up that now the time has
- surely come.
-
- I have decided that, in addition to describing the 'line of best fit'
- functions linest(y,x) and the trend({c,m},x) function, I could use the
- opportunity to show you how using Arrays and Names often make formulae
- much more usable and readable.
-
- Best Straight line
-
- Let's start by creating the spreadsheet shown in the screen shot,
- figure 1, below.
-
- The formula for a straight line is y = m*x + c. The parameter c is
- called 'the constant term' and m is called 'the gradient'. In the
- range A12A17 I have inserted half a dozen values of x and, in the range
- B12B17, the corresponding values of y have been calculated by putting c
- and m (from B6 and B7) into the formula y = m*x + c. This formula is
- used in many engineering and financial applications. An example is
- "total cost = marginal cost of a unit * number of units + fixed cost".
- The "marginal cost of a unit" is the gradient, m, and in this example,
- it is the extra cost of producing one more unit. The "fixed cost", c,
- is the cost you incur just getting ready to make some units (such as
- machinery) but it doesn't include the "marginal costs" such as raw
- material and labour.
-
- Names
-
- In slot B12 I could have typed B$7*A12+B$6 and then replicated it down
- through the range B12B17 - but I haven't done that. In my version slot
- B12 contains the formula set_value(y,m*x+c). I suggest that my version
- makes the spreadsheet much more readable but at the expense of a little
- more effort on the part of the writer of the application.
-
- It may help you follow the next paragraph if you refer to the screen
- shot, figure 2 below.
-
- The four letters within the set_value(y,m*x+c) function, y, m, x and c,
- are all PipeDream Names. If, in the final version of this speadsheet,
- you were to place the pointer over the italic f (just to the left of
- the tick and cross and to the right of the PipeDream logo - near the
- formula line) and click the mouse select (left) button then a sub menu
- called 'functions' will open and, at the bottom of the list, you will
- see 'Edit name'. If you were to run the pointer through the arrow to
- the right of 'Edit name' you would see that there are seven names
- defined. These are x, y, y_est, m, c, c_est, m_est. You could run the
- pointer through, say, c, and you would see that the definition of the
- name c is the content of the slot B6. Similarly you could establish
- that the name x is the range of slots A12A17. This range contains the
- values of x!
-
- To define a name such as x as the range of slots A12A17 you click on
- the italic f, run the pointer through 'Define name', type x in the
- dialogue box alongside 'Name:', you type A12A17 into the dialogue box
- alongside 'Refers to:' and finally click on the OK box.
-
- The names I have defined are: x as A12A17, y as B12B17,
- y_est as D12D17, m as B7, c as B6, c_est as D6 and m_est as D7.
-
- The set_value(,) function
-
- The function set_value(,) is a simple way of evaluating a function many
- times and 'poking' the answers into a range of slots. It is
- particularly simple when used with names. Yes, I know it takes time to
- define the names, but, for large ranges particularly, it is worth it.
- The function in slot B12 is set_value(y,m*x+c). Having defined the
- names as slots or ranges of slots set_value(y,m*x+c) calculates all
- the y values from m*x+c and 'pokes' the answers into the range of slots
- defined as y. What is 'clever' about PipeDream is that each value in
- the y range is individually calculated using its corresponding x and
- the result 'poked' into the 'correct' y slot.
-
- The set_value(,) formula is written only once and does not have to be
- replicated (or copied) down the range of slots. Using names in this
- way reduces the size of the PipeDream file in memory (the function
- appears in only one slot) and on the disc. Because the file is smaller
- the loading time and the time to recalculate is also reduced. I
- recommend using names and set_value(,) to you as a way of reducing
- memory usage, disc space and speeding up loading, saving and
- recalculation.
-
- The linest(y,x) function
-
- In slot D6 I have entered the formula index(linest(y,x),1,1). Let's
- deal with linest(y,x) first.
-
- The function linest(y,x) takes as its arguments only two ranges of
- values, in our case y and x. Please note that the y range is the first
- argument. In our simple example x is a single column of values, a more
- complex version (which we'll deal with another day) uses one y range
- and a many column x range! When used in this simple way linest(y,x)
- returns an array of two numbers in a single slot. The two numbers are
- the row array {c_est,m_est} where c_est and m_est are the 'constant
- term' and the 'gradient' of a straight line which is the 'best'
- straight line for the set of points. Strictly it is the 'least squares
- regression line of y on x'!
-
- On page 250 of the First Edition of the PipeDream 4 Reference Guide the
- function linest(,,,,) is shown with five arguments. All too often I
- have been asked how to use the last three arguments. The answer is
- that you can't because, in PipeDream version 4.13 they don't exist and,
- as far as I know, they are never likely to. The function linest(y,x)
- can take only two arguments but the second argument, x, can consist of
- many columns. Let's deal with what is called 'multivariate regression'
- another day. For now let it suffice that the arguments stats,
- constant_parameters and measured_errors of the First Edition do not and
- will not exist.
-
- The index(array,col,row) function
-
- To 'split up' the array returned by linest(y,x) into its two separate
- terms, {c_est,m_est}, we need to use the index(array,col,row) function.
- The index function I have entered into slot D6 is
- index(linest(y,x),1,1). Since linest(y,x) = {c_est,m_est},
- array, index(linest(y,x),1,1) = (index({c_est,m_est},1,1).
-
- The array {c_est,m_est} has only one row and so the last (third)
- argument of index({c_est,m_est},1,1) must be 1. The second argument,
- 1, selects the first element of the array which is c_est. Hence
- index(linest(y,x),1,1) = c_est.
-
- The value in D6 is our best estimate of the constant term c_est for the
- 'least squares regression line of y on x'!
-
- Slot D7 contains index(linest(y,x),2,1). The only difference is the
- second argument, 2, which selects the gradient, m_est. The value in D7
- is our best estimate of the gradient, m_est.
-
- A point you should realise here is that I've 'cheated' by using 'exact'
- values of y calculated from the formula y = m*x + c so that, if I have
- the 'right' formulae in column D, then my 'estimated' values for c_est
- and m_est are bound to be 'exact' and equal to the values of c and m in
- B6 and B7!
-
- The trend({c,m},x) function
- From the screen shot, figure 1, you will see that slot D12 contains the
- formula set_value(y_est,trend({c_est,m_est},x)) - and we've dealt with
- set_value(,). Note that y_est is the name of the range D12D17, so
- let's have a look at trend({c_est,m_est},x).
-
- The function trend({c_est,m_est},x) returns a column array when x is a
- column (it is in this case) or a row array when x is a row. The column
- array returned by trend({c_est,m_est},x) is exactly the same array as
- that given by the formula {m_est*x + c_est}. We saw how {m*x + c}
- returned an array when we looked at slot B12. I think that, when I
- know the values of c and m (separately), then I would prefer to enter
- m_est*x + c_est (without the curly brackets) instead of
- trend({c_est,m_est},x); it's shorter and it does exactly the same
- thing. On another occasion when we have a look at the function
- logest(y,x) to produce log linear and log log graphs we shall see that
- this second approach has its advantages. The function trend(,) has the
- advantage if you don't want to use the values of {c,m} separately. In
- those cases you can write trend(linest(y,x),x); and we'll find that we
- can't do that with log linear and log log curve fitting.
-
- To summarise where we are up to now. In the spreadsheet of figure 1
- there are only five slots containing formula. These are:
-
- B12 = set_value(y, m*x+c)
- C12 = set_value(C12C17,x)
- D12 = set_value(y_est,trend({c_est,m_est},x))
- D6 = index(linest(y,x),1,1)
- D7 = index(linest(y,x),2,1)
-
- There are seven PipeDream Names which have been defined. These are:
-
- x = A12A17
- y = b12B17
- y_est = D12D17
- m = B7
- c = B6
- m_est = D7
- c_est = D6
-
- You can alter the values in B6 and B7 and in the range A12A17 but, for
- now, you should change nothing else.
-
- Best Straight line
- Up to now the y values in B12B17 have been generated from a formula
- which uses the values c and m in B6 and B7.
-
- Place the cursor in slot B12, delete the formula and replace it with
- the number 1. Nothing much should happen but the values of y are now
- no longer calculated from the formula using c and m from B6 and B7.
- Enter a few y values which are different from but approximately the
- same as those generated by the formula and, when the sheet has
- recalculated, you will find that y and y_est do not quite match. If
- you plot the values of y against x you will no longer have a straight
- line (unless you're lucky) but plotting y_est against x will give you a
- perfect straight line, the 'best' straight line for the x and y values.
-
- Don't save the modified sheet unless you change the name!
-
- Charting
-
- To help you visualise what is going on you might like to try creating a
- chart with two sets of data. The first set should use the x and y
- values in A12B17 and you can plot it in red. The second set should
- use the data in C12D17 and you can plot it in blue.
-
- You won't see the blue points until you modify the y data as described
- in the previous paragraph.
-